智能论文笔记

On Noisy Evaluation in Federated Hyperparameter Tuning

Kevin Kuo , Pratiksha Thaker , Mikhail Khodak , John Ngyuen , Daniel Jiang , Ameet Talwalkar , Virginia Smith

分类：机器学习

2022-12-17

Hyperparameter tuning is critical to the success of federated learning applications. Unfortunately, appropriately selecting hyperparameters is challenging in federated networks. Issues of scale, privacy, and heterogeneity introduce noise in the tuning process and make it difficult to evaluate the performance of various hyperparameters. In this work, we perform the first systematic study on the effect of noisy evaluation in federated hyperparameter tuning. We first identify and rigorously explore key sources of noise, including client subsampling, data and systems heterogeneity, and data privacy. Surprisingly, our results indicate that even small amounts of noise can significantly impact tuning methods-reducing the performance of state-of-the-art approaches to that of naive baselines. To address noisy evaluation in such scenarios, we propose a simple and effective approach that leverages public proxy data to boost the evaluation signal. Our work establishes general challenges, baselines, and best practices for future work in federated hyperparameter tuning.

translated by 谷歌翻译

Provably tuning the ElasticNet across instances

Maria-Florina Balcan , Mikhail Khodak , Dravyansh Sharma , Ameet Talwalkar

分类：机器学习 | (统计)机器学习

2022-07-20

正规化理论中，一个重要的尚未解决的挑战是设置流行技术的正则化系数，例如Elasticnet，并具有一般可证明的保证。我们考虑了在多个问题实例中调整脊回归，拉索和ElasticNet的正则化参数的问题，该设置涵盖了交叉验证和多任务超级参数优化。我们获得了ElasticNet的新结构结果，该结构结果将损失作为调谐参数的函数作为具有代数边界的分段理性函数。我们使用它来绑定正则损耗函数的结构复杂性，并显示了在统计环境中调整ElasticNet回归系数的概括保证。我们还考虑了更具挑战性的在线学习设置，在此设置中，我们表现出相对于最佳参数对的平均预期遗憾。我们将结果进一步扩展到通过阈值回归拟合通过Ridge，Lasso或ElasticNet正式化的调整分类算法。我们的结果是对于避免对数据分布的强烈假设的这一重要类别的第一类的一般学习理论保证。此外，我们的保证既可以验证和流行信息标准目标。

translated by 谷歌翻译

NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks

Renbo Tu , Nicholas Roberts , Mikhail Khodak , Junhong Shen , Frederic Sala , Ameet Talwalkar

分类：计算机视觉 | 机器学习

2021-10-12

大多数现有的神经体系结构搜索（NAS）基准和算法优先考虑了良好的任务，例如CIFAR或Imagenet上的图像分类。这使得在更多样化的领域的NAS方法的表现知之甚少。在本文中，我们提出了NAS-Bench-360，这是一套基准套件，用于评估超出建筑搜索传统研究的域的方法，并使用它来解决以下问题：最先进的NAS方法在多样化的任务？为了构建基准测试，我们策划了十个任务，这些任务涵盖了各种应用程序域，数据集大小，问题维度和学习目标。小心地选择每个任务与现代CNN的搜索方法互操作，同时可能与其原始开发领域相距遥远。为了加快NAS研究的成本，对于其中两个任务，我们发布了包括标准CNN搜索空间的15,625个体系结构的预定性能。在实验上，我们表明需要对NAS BENCH-360进行更强大的NAS评估，从而表明几种现代NAS程序在这十个任务中执行不一致，并且有许多灾难性差的结果。我们还展示了NAS Bench-360及其相关的预算结果将如何通过测试NAS文献中最近推广的一些假设来实现未来的科学发现。 NAS-Bench-360托管在https://nb360.ml.cmu.edu上。

translated by 谷歌翻译

Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing

Mikhail Khodak , Renbo Tu , Tian Li , Liam Li , Maria-Florina Balcan , Virginia Smith , Ameet Talwalkar

分类：机器学习 | 人工智能 | (统计)机器学习

2021-06-08

调整Quand参数是机器学习管道的重要而艰巨的部分。在联合学习中，封锁率优化更具挑战性，在多均匀设备的分布式网络上学习模型;在这里，需要保留设备上的数据并执行本地培训使得难以有效地培训和评估配置。在这项工作中，我们调查联邦封面调整的问题。我们首先识别关键挑战，并展示标准方法如何适应联合环境的基线。然后，通过与重量共享的神经结构搜索技术进行新颖的连接，我们介绍了一种新的方法，联邦快递，以加速联合的超参数调整，该调整适用于广泛使用的联合优化方法，例如FADVG和最近的变体。从理论上讲，我们表明联邦快递器在跨设备的在线凸优化的设置中正确调整了在设备上的学习速率。凭经验，我们表明，联邦快递可以在莎士比亚，春头和CIFAR-10基准上的几个百分点占据联邦封面调整的自然基线，使用相同的培训预算获得更高的准确性。

translated by 谷歌翻译

Rethinking Neural Operations for Diverse Tasks

Nicholas Roberts , Mikhail Khodak , Tri Dao , Liam Li , Christopher Ré , Ameet Talwalkar

分类：机器学习 | 人工智能 | 计算机视觉 | (统计)机器学习

2021-03-29

Automl的一个重要目标是自动化在探索域内的新任务上的神经网络设计。通过这一目标激励，我们研究了使用户能够发现来自其特定域的数据的正确神经操作的问题。我们介绍一个名为XD-Operation的搜索空间，这些操作模仿标准多通道卷曲的归纳偏差，同时更具表现力：我们证明它包括多个应用程序区域的许多命名操作。从Reset等任何标准骨干开始，我们展示了如何通过XD操作将其转换为搜索空间以及如何使用简单的权重共享方案遍历空间。在各种任务组合 - 求解PDES，距离蛋白质折叠和音乐建模的距离预测 - 我们的方法一致地产生比基线网络更低的误差的模型，并且通常更低的误差比专业设计的域特定方法更低。

translated by 谷歌翻译

Advances and Open Problems in Federated Learning

Peter Kairouz , H. Brendan McMahan , Brendan Avent , Aurélien Bellet , Mehdi Bennis , Arjun Nitin Bhagoji , Kallista Bonawitz , Zachary Charles , Graham Cormode , Rachel Cummings

分类：

2019-12-10

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

translated by 谷歌翻译

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

Sanjeev Arora , Hrishikesh Khandeparkar , Mikhail Khodak , Orestis Plevrakis , Nikunj Saunshi

分类：

2019-02-25

Recent empirical works have successfully used unlabeled data to learn feature representations that are broadly useful in downstream classification tasks. Several of these methods are reminiscent of the well-known word2vec embedding algorithm: leveraging availability of pairs of semantically "similar" data points and "negative samples," the learner forces the inner product of representations of similar pairs with each other to be higher on average than with negative samples. The current paper uses the term contrastive learning for such algorithms and presents a theoretical framework for analyzing them by introducing latent classes and hypothesizing that semantically similar points are sampled from the same latent class. This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes. Our generalization bound also shows that learned representations can reduce (labeled) sample complexity on downstream tasks. We conduct controlled experiments in both the text and image domains to support the theory.

translated by 谷歌翻译

Feature learning in neural networks and kernel machines that recursively learn features

Adityanarayanan Radhakrishnan , Daniel Beaglehole , Parthe Pandit , Mikhail Belkin

分类：机器学习 | 人工智能

2022-12-28

Neural networks have achieved impressive results on many technological and scientific tasks. Yet, their empirical successes have outpaced our fundamental understanding of their structure and function. By identifying mechanisms driving the successes of neural networks, we can provide principled approaches for improving neural network performance and develop simple and effective alternatives. In this work, we isolate the key mechanism driving feature learning in fully connected neural networks by connecting neural feature learning to the average gradient outer product. We subsequently leverage this mechanism to design \textit{Recursive Feature Machines} (RFMs), which are kernel machines that learn features. We show that RFMs (1) accurately capture features learned by deep fully connected neural networks, (2) close the gap between kernel machines and fully connected networks, and (3) surpass a broad spectrum of models including neural networks on tabular data. Furthermore, we demonstrate that RFMs shed light on recently observed deep learning phenomena such as grokking, lottery tickets, simplicity biases, and spurious features. We provide a Python implementation to make our method broadly accessible [\href{https://github.com/aradha/recursive_feature_machines}{GitHub}].

translated by 谷歌翻译

Less is More: Parameter-Free Text Classification with Gzip

Zhiying Jiang , Matthew Y. R. Yang , Mikhail Tsirlin , Raphael Tang , Jimmy Lin

分类：自然语言处理

2022-12-19

Deep neural networks (DNNs) are often used for text classification tasks as they usually achieve high levels of accuracy. However, DNNs can be computationally intensive with billions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize and to transfer to out-of-distribution (OOD) cases in practice. In this paper, we propose a non-parametric alternative to DNNs that's easy, light-weight and universal in text classification: a combination of a simple compressor like gzip with a $k$-nearest-neighbor classifier. Without any training, pre-training or fine-tuning, our method achieves results that are competitive with non-pretrained deep learning methods on six in-distributed datasets. It even outperforms BERT on all five OOD datasets, including four low-resource languages. Our method also performs particularly well in few-shot settings where labeled data are too scarce for DNNs to achieve a satisfying accuracy.

translated by 谷歌翻译

Solving Sample-Level Out-of-Distribution Detection on 3D Medical Images

Daria Frolova , Anton Vasiliuk , Mikhail Belyaev , Boris Shirokikh

分类：计算机视觉

2022-12-13

Deep Learning (DL) models tend to perform poorly when the data comes from a distribution different from the training one. In critical applications such as medical imaging, out-of-distribution (OOD) detection helps to identify such data samples, increasing the model's reliability. Recent works have developed DL-based OOD detection that achieves promising results on 2D medical images. However, scaling most of these approaches on 3D images is computationally intractable. Furthermore, the current 3D solutions struggle to achieve acceptable results in detecting even synthetic OOD samples. Such limited performance might indicate that DL often inefficiently embeds large volumetric images. We argue that using the intensity histogram of the original CT or MRI scan as embedding is descriptive enough to run OOD detection. Therefore, we propose a histogram-based method that requires no DL and achieves almost perfect results in this domain. Our proposal is supported two-fold. We evaluate the performance on the publicly available datasets, where our method scores 1.0 AUROC in most setups. And we score second in the Medical Out-of-Distribution challenge without fine-tuning and exploiting task-specific knowledge. Carefully discussing the limitations, we conclude that our method solves the sample-level OOD detection on 3D medical images in the current setting.

translated by 谷歌翻译